Incorporation of a Valency Lexicon into a TectoMT Pipeline

نویسندگان

  • Natalia Klyueva
  • Vladislav Kuboň
چکیده

In this paper, we focus on the incorporation of a valency lexicon into TectoMT system for Czech-Russian language pair. We demonstrate valency errors in MT output and describe how the introduction of a lexicon influenced the translation results. Though there was no impact on BLEU score, the manual inspection of concrete cases showed some improvement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Czechizator - Čechizátor

We present a lexicon-less rule-based machine translation system from English to Czech, based on a very limited amount of transformation rules. Its core is a novel translation module, implemented as a component of the TectoMT translation system, and depends massively on the extensive pipeline of linguistic preprocessing and postprocessing within TectoMT. Its scope is naturally limited, but for s...

متن کامل

From PropBank to EngValLex: Adapting the PropBank-Lexicon to the Valency Theory of the Functional Generative Description

EngValLex is the name of an FGD-compliant valency lexicon of English verbs, built from the PropBank-Lexicon and following the structure of Vallex, the FGD-based lexicon of Czech verbs. EngValLex is interlinked with the PropBank-Lexicon, thus preserving the original links between the PropBank-Lexicon and the PropBank-Corpus. Therefore it is also supposed to be part of corpus annotation. This pap...

متن کامل

Building the PDT-Vallex valency lexicon

In our contribution, we relate the development of a richly annotated corpus and a computational valency lexicon. Our valency lexicon, called PDT-Vallex (Hajič et al., 2003) has been created as a “byproduct” of the annotation of the Prague Dependency Treebank (PDT) but it became an important resource for further linguistic research as well as for computational processing of the Czech language. W...

متن کامل

Annotation Lexicons: Using the Valency Lexicon for Tectogrammatical Annotation

We present a formalization of the valency theory (Panevová, 1974) that fits the stratificational representation scheme used in the Prague Dependency Treebank. The notion of a lexicon as a repository of “static” (invariable, or context-independent) source of information is formally presented; a different type of lexicon is used at every layer of sentence representation, with a formal link to thi...

متن کامل

Valency Lexicon of Czech Verbs: Towards Formal Description of Valency and Its Modeling in an Electronic Language Resource

Valency refers to the capacity of verb (or a word belonging to another part of speech) to take a specific number and type of syntactically dependent language units. Valency information is thus related to particular lexemes and as such it is necessary to describe valency characteristics for separate lexemes in the form of lexicon entries. A valency lexicon is indispensable for any complex Natura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016